1 Map of Sampling Stations in the East China Sea

Load package:

library(marmap)

Import bathymetry data from NOAA:

b = getNOAA.bathy(lon1 = 115, lon2 = 140, lat1 = 35, lat2 = 20, resolution = 1)
## Querying NOAA database ...
## This may take seconds to minutes, depending on grid size
## Building bathy matrix ...

Import dataframe with the latitude and longitude of sampling points (in decimal degrees):

pts1 <- read.csv("~/desktop/MiraiDPI/SamplingLatLong.csv")
pts1$Station<- as.character(pts1$Station)
str(pts1)
## 'data.frame':    10 obs. of  3 variables:
##  $ Lon    : num  126 127 126 125 124 ...
##  $ Lat    : num  26.3 25.9 26.5 26 25.5 ...
##  $ Station: chr  "2" "4" "5" "8" ...
pts_photos <-  read.csv("~/desktop/MiraiDPI/SamplingLatLong2.csv")
## Warning in read.table(file = file, header = header, sep = sep, quote =
## quote, : incomplete final line found by readTableHeader on '~/desktop/MiraiDPI/
## SamplingLatLong2.csv'
pts_photos$Station<- as.character(pts_photos$Station)
str(pts_photos)
## 'data.frame':    4 obs. of  3 variables:
##  $ Lon    : num  128 124 129 130
##  $ Lat    : num  25.4 24.9 28.8 29
##  $ Station: chr  "3" "10" "15" "17"

Make dataframe with map labels and their lat/long:

Clon = c(127.5,131.25, 121.6, 121.03, 120.8, 126)
Clat= c(26, 31.93, 31.1, 23, 27.2, 22)
CLabels= c("Okinawa", "Japan", "China", "Taiwan", "East China Sea", "Philippine Sea")
Countries = cbind.data.frame(Clon, Clat, CLabels)
str(Countries)
## 'data.frame':    6 obs. of  3 variables:
##  $ Clon   : num  128 131 122 121 121 ...
##  $ Clat   : num  26 31.9 31.1 23 27.2 ...
##  $ CLabels: Factor w/ 6 levels "China","East China Sea",..: 4 3 1 6 2 5

Make plot one layer at a time:

#define plot colors
blues <- c("lightsteelblue4", "lightsteelblue3", "lightsteelblue2", "lightsteelblue1")
greys <- c(grey(0.6), grey(0.93), grey(0.99))
#tiff("~/desktop/MiraiDPI/MiraiFiltersMap.tiff", height= 8, width =8, units = 'in', res=300) # uncomment to save plot to file
plot(b, image = TRUE, land = TRUE, xlim=c(120,135), ylim=c(22, 32), lwd = 0.03, bpal = list(c(0, max(b), greys), c(min(b),0 , blues)), cex.lab=1.5, cex.axis=1.5, cex.main=1.5, cex.sub=1.5) #bathymetry colors
plot(b, lwd = 0.8, deep = 0, shallow = 0, step = 0, add = TRUE) # highlight coastline with solid black line
plot(b, deep=-200, shallow=-200, lwd = 0.4, drawlabels=T, add=T)  # Add -200m isobath
plot(b, deep=-1000, shallow=-1000, lwd = 0.4, drawlabels=T, add=T)  # Add -1000m isobath
plot(b, deep=-2000, shallow=-2000, lwd = 0.4, drawlabels=T, add=T)  # Add -2000m isobath
plot(b, deep=-3000, shallow=-3000, lwd = 0.4, drawlabels=T, add=T)  # Add -3000m isobath
points(pts1, pch = 21, bg = 'white', col = 'red') # Add sampling points
points(pts_photos, pch = 19, col = 'red')  #Add points for stations with photo profiles
text(pts1, labels=pts1$Station, adj= 1.5, cex=1.2) # Add sampling station numbers
text(pts_photos, labels=pts_photos$Station, adj= 1.5, cex=1.2) # add photo station numbers
text(Countries, labels=Countries$CLabels, cex= 1.7, adj = -0.2) # Add map labels 

#dev.off() #uncomment to save plot to file

2 High-throughput sequence analysis

2.1 Identify and Count Amplicon Sequence Variants (ASVs)

Sample collection, DNA extraction, PCR, & sequencing

Seawater collected from the Subsurface Chlorophyll Maximum (SCM), mid-water column, and near-bottom was collected in 10 L Niskin bottles at each sampling station and surface seawater was collected by bucket. Five Liter replicates from SCM, mid, and near-bottom water samples and 4.5 L replicates from surface samples were sequentially filtered through 10.0 and 0.2 µm pore-size Polytetrafluoroethylene (PTFE) filters. Filters were flash frozen in liquid nitrogen and stored at -80 ºC until DNA extraction. DNA was extracted following the manufacturers protocols for the Qiagen/MoBio DNeasy PowerWater Kit, including the optional heating step to lyse cells. Amplicon PCR and sequencing library preparation followed the procedures described in the Illumina 16S Metagenomic Sequencing Library Preparation manual modified only to include universal eukaryotic primers for the v4 region of the 18S rRNA gene (F: Stoeck et al. 2010, R: Brisbin et al. 2018) and amplicon PCR conditions most appropriate for these primers. The 220 samples were radomly assigned to 4 sequencing runs on the Illumina MiSeq sequencing platform with v3 chemistry for 300x300-bp paired-end sequencing.

Sequences are available from the NCBI Sequencing Read Archive (SRA) with accession PRJNA546472.

Bioinformatic processing with Qiime2

Demultiplexed paired-end sequences provided by the OIST sequencing center were imported to Qiime2 (v2018.11) software (Bolyen et al. 2018). The Divisive Amplicon Denoising Algorithm (DADA) was implemented with the DADA2 plug-in for Qiime2 for quality filtering, chimera removal, and feature table construction (Callahan et al. 2015) separately for each sequencing run before merging the four resulting feature tables. Taxonomy was assigned to Amplicon Sequence Variants (ASVs) in the feature table by training a classifier on the Protist Ribosomal Reference (PR2) database to classify ASVs (Guillo et al. 2012). The feature table and assigned taxonomy were then exported from Qiime2 for further analysis.

Below is an example Qiime2 workflow for 1 sequencing run:

Activate an anaconda environment for qiime2: source activate qiime2-2018.11

Import sequencing data (all sequences for eacg sequencing run must be in their own directory): qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path ./seqs/seqrun1/ --input-format CasavaOneEightSingleLanePerSampleDirFmt --output-path run1_demux-paired-end.qza

Visualize squencing results (use these graphs to decide what length to trim sequences) qiime demux summarize --i-data run1_demux-paired-end.qza --o-visualization run1_demux.qzv

Run the DADA2 algorithm: qiime dada2 denoise-paired --i-demultiplexed-seqs run1_demux-paired-end.qza --output-dir ./dd2run1 --o-representative-sequences run1_rep-seqs --p-trim-left-f 10 --p-trim-left-r 10 --p-trunc-len-f 260 --p-trunc-len-r 250 --p-n-threads 3

Check results: qiime feature-table summarize --i-table ./dd2run1/table.qza --o-visualization ./dd2run1/table.qzv --m-sample-metadata-file ~/desktop/MiraiFilters/sampleMaptxt.txt

qiime metadata tabulate --m-input-file dd2run1/denoising_stats.qza --o-visualization dd2run1/denoising_stats.qzv

Merge feature tables from multiple sequencing runs: qiime feature-table merge --i-tables ./dd2run1/table.qza --i-tables ./dd2run2/table.qza --i-tables ./dd2run3/table.qza --i-tables ./dd2run4/table.qza --o-merged-table mergedtable.qza

qiime feature-table merge-seqs --i-data run1_rep-seqs.qza --i-data run2_rep-seqs.qza --i-data run3_rep-seqs.qza --i-data run4_rep-seqs.qza --o-merged-data merged-rep-seqs.qza

Import taxonomy database: qiime tools import --type 'FeatureData[Sequence]' --input-path pr2_version_4.11.1_mothur.fasta --output-path pr2.qza

qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat --input-path pr2_version_4.11.1_mothur.tax -output-path pr2_tax.qza

Select v4 region from PR2 sequences: qiime feature-classifier extract-reads --i-sequences pr2.qza --p-f-primer CCAGCASCYGCGGTAATTCC --p-r-primer ACTTTCGTTCTTGATYRA --o-reads pr2_v4.qza

Train the classifier: qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads pr2_v4.qza --i-reference-taxonomy pr2_tax.qza --o-classifier pr2_v4_classifier.qza

Classify: qiime feature-classifier classify-sklearn --i-classifier pr2_v4_classifier.qza --i-reads merged-rep-seqs.qza --o-classification taxonomy.qza

qiime metadata tabulate --m-input-file taxonomy.qza --o-visualization taxonomy.qzv

Export the .tsv taxonomy file from taxonomy.qzv for use in following steps.

2.2 Load data into R & prepare phyloseq objects

Load packages:

library(tidyverse)
#devtools::install_github("jbisanz/qiime2R")
library(qiime2R)
library('phyloseq')
library('vegan')
library('ggplot2')
library(RColorBrewer)
library(DESeq2)
library(reshape)
library("wesanderson")
library("gridExtra")

Set default colors:

cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
P <- brewer.pal(12, "Paired")
S2 <- brewer.pal(8, "Set2")
S1 <- brewer.pal(8, "Set1")
D2 <- brewer.pal(8, "Dark2")
colors<- c(P, S2, S1,D2)
colors2 <- rep(colors, 6)

g_legend<-function(a.gplot){
  tmp <- ggplot_gtable(ggplot_build(a.gplot))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  legend
}

Convert Qiime2 artifacts to phyloseq objects:

phyloseq<-qza_to_phyloseq(features="~/desktop/MiraiFilters/mergedtable.qza")
## Warning in read_qza(features): Artifact was not generated with Qiime2 2018.4,
## if data is not successfully imported, please report here github.com/jbisanz/
## qiime2R/issues

Import sample data:

metatable <- read.csv("~/desktop/MiraiFilters/SampleMap.csv")
row.names(metatable) <- metatable[[1]]

metatable$Station <- as.character(metatable$Station)
metatable$filter <-as.factor(metatable$filter)

META<- sample_data(metatable)
str(META)
## 'data.frame':    212 obs. of  32 variables:
## Formal class 'sample_data' [package "phyloseq"] with 4 slots
##   ..@ .Data    :List of 32
##   .. ..$ : Factor w/ 212 levels "A6-F173-DNA",..: 3 6 9 12 96 97 115 118 129 134 ...
##   .. ..$ : Factor w/ 212 levels "A6-F173-DNA",..: 3 6 9 12 96 97 115 118 129 134 ...
##   .. ..$ : chr  "2" "2" "2" "2" ...
##   .. ..$ : Factor w/ 8 levels "1","31","32",..: 7 7 8 8 1 1 4 4 6 6 ...
##   .. ..$ : Factor w/ 14 levels "C02","C03","C04",..: 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ : Factor w/ 4 levels "Bottom","Mid",..: 4 4 4 4 1 1 2 2 3 3 ...
##   .. ..$ : Factor w/ 5 levels "Bottom","Deep",..: 5 5 5 5 1 1 3 3 4 4 ...
##   .. ..$ : Factor w/ 2 levels "D","S": 2 2 2 2 1 1 1 1 2 2 ...
##   .. ..$ : Factor w/ 109 levels "10Bott1","10Bott2",..: 71 71 72 72 65 65 67 67 69 69 ...
##   .. ..$ : int  0 0 0 0 1000 1000 700 700 92 92 ...
##   .. ..$ : Factor w/ 2 levels "2","10": 2 1 2 1 2 1 2 1 2 1 ...
##   .. ..$ : Factor w/ 2 levels "n","y": 2 2 2 2 2 2 2 2 2 2 ...
##   .. ..$ : Factor w/ 4 levels "KG","MOT","NOT",..: 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ : Factor w/ 3 levels "KG","NOT","SOT": 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ : Factor w/ 3 levels "n","p","y": 3 3 3 3 3 3 3 3 3 3 ...
##   .. ..$ : Factor w/ 4 levels "KG","NOT","O",..: 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ : Factor w/ 2 levels "n","y": 2 2 2 2 2 2 2 2 2 2 ...
##   .. ..$ : num  0 0 0 0 1056 ...
##   .. ..$ : num  25.25 25.25 25.25 25.25 4.46 ...
##   .. ..$ : num  34.7 34.7 34.7 34.7 34.4 ...
##   .. ..$ : num  202.2 202.2 202.2 202.2 69.8 ...
##   .. ..$ : num  23.1 23.1 23.1 23.1 27.3 ...
##   .. ..$ : num  0.0698 0.0698 0.0698 0.0698 0.051 ...
##   .. ..$ : num  0.061 0.061 0.061 0.061 1.107 ...
##   .. ..$ : num  0.521 0.521 0.521 0.521 11.128 ...
##   .. ..$ : num  0.06 0.06 0.06 0.06 37.51 ...
##   .. ..$ : num  0.01 0.01 0.01 0.01 0.02 0.02 0 0 0.11 0.11 ...
##   .. ..$ : num  1.05 1.05 1.05 1.05 111.04 ...
##   .. ..$ : num  0.033 0.033 0.033 0.033 2.666 ...
##   .. ..$ : num  0.05 0.05 0.05 0.05 0.03 0.03 0.03 0.03 0.04 0.04 ...
##   .. ..$ : num  26.3 26.3 26.3 26.3 26.3 ...
##   .. ..$ : num  126 126 126 126 126 ...
##   ..@ names    : chr  "X" "SampleID" "Station" "Bottle" ...
##   ..@ row.names: chr  "A8-F17-DNA" "B8-F18-DNA" "C8-F19-DNA" "D8-F20-DNA" ...
##   ..@ .S3Class : chr "data.frame"

Load PR2 taxonomy:

taxonomy <- read.csv("~/desktop/MiraiFilters/taxonomy.csv", stringsAsFactors = FALSE)
names(taxonomy) <- c("row", "tax", "Confidence")
row.names(taxonomy) <-taxonomy[[1]]
taxonomy <- taxonomy[,(-1)]
taxonomy <-  separate(taxonomy, tax, c("D0","D1", "D2", "D3", "D4", "D5", "D6", "D7", "D8", "D9", "D10", "D11", "D12", "D13", "D14"), sep = ";", fill = "right")
taxonomy <- taxonomy[,c(1:8)]
taxmat <- as.matrix(taxonomy)
TAX = tax_table(taxmat)
str(TAX)
## Formal class 'taxonomyTable' [package "phyloseq"] with 1 slot
##   ..@ .Data: chr [1:23115, 1:8] "Eukaryota" "Eukaryota" "Eukaryota" "Eukaryota" ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:23115] "000a220fe8edd771dd82f6915627ef3e" "000c79760dd16d6692018deb8ea1b164" "000d9eaa21eaff08f34fb1a1a87601d5" "000e6f109e01181b2b1eda5d6c99319c" ...
##   .. .. ..$ : chr [1:8] "D0" "D1" "D2" "D3" ...

Add taxonomy to phyloseq object:

ps = merge_phyloseq(phyloseq, TAX, META)

Some basic statistics:

#make ASV dataframe
OTUs <- data.frame(otu_table(ps))

Total number of ASVs in the data set:

OTUsRS<- OTUs
OTUsRS$RowSum <- rowSums(OTUsRS)
OTUsRSnoZero <- OTUsRS$RowSum!=0
sum(OTUsRSnoZero)
## [1] 22656

Total Number of Sequences:

sum(OTUsRS$RowSum)
## [1] 16340248

merge ASV table with taxonomy:

otusWtax <- merge(OTUs, taxonomy, by = 'row.names')

keep only Acantharian ASVs:

otusA <-  filter(otusWtax, D3 == "Acantharea") 

Number of acantharian ASVs in the data set:

OTUsRS<- otusA[,-c(1,213:220)]

OTUsRS$RowSum <- rowSums(OTUsRS)
OTUsRSnoZero <- OTUsRS$RowSum!=0
sum(OTUsRSnoZero)
## [1] 1053

Acantharian ASV Richness:

OTUs0 <- OTUsRS!=0 #is this number not a zero? true (1) or false (0)
csums <- colSums(OTUs0) # col sums = observed ASV richness
csumdf <- as.data.frame(csums)
max(csumdf$csums) #1906
## [1] 1053
min(csumdf$csums) #49
## [1] 2
mean(csumdf$csums) #730
## [1] 37.84434

how acantharian seqs total:

csums <- colSums(OTUsRS) # col sums = observed ASV richness
csumdf <- as.data.frame(csums)
sum(csumdf$csums)
## [1] 1099858

2.3 Distance and Ordination

remove mislabeled samples

mixedup <- c("F43", "F44", "F45", "F46", "F47", "F48", "F84")
'%ni%' <- Negate('%in%')
ps<- subset_samples(ps, SampleID %ni% mixedup)
ap<- c("#F1B6A1", "#D4A52A", "#E3E5DB", "#A5CFCC", "#0E899F", "#A83860", "#ED91BC", "#DB5339", "#F58851", "#42465C", "#1E479A", "#F7CDA4", "#CF529C", "#11638C")
  • compute CLR normalization, CLR = centered log-ratio, log(x/gx) where gx is the geomentric mean of vector x

2.3.1 Full Community

library("CoDaSeq")
OTU4clr<- data.frame(t(data.frame(otu_table(ps))))
row.names(OTU4clr) <- gsub("\\.", "", row.names(OTU4clr))
OTUs.clr <- codaSeq.clr(OTU4clr + 0.5, samples.by.row=TRUE)
OTU2 <- otu_table(as.matrix(OTUs.clr), taxa_are_rows = FALSE)
metatableCLR <-metatable
row.names(metatableCLR) <- gsub("-", "", row.names(metatableCLR))
META2<- sample_data(metatable)
psCLR <- phyloseq(OTU2,TAX,META2)
ordu = ordinate(psCLR, "PCoA", "euclidean")
p_full<-plot_ordination(psCLR, ordu, color="Depth", shape="filter")+theme_bw() +scale_color_manual(values=c(ap[8], ap[12], ap[5], ap[4])) + scale_shape_discrete(labels=c( "0.2", "10.0")) +
geom_point(size=3) +theme(text = element_text(size=16)) + ggtitle("A. Full Communities")
p_full

There is clustering by sample depth: DCM, Surface, and Mid/Bottom. In the DCM and Surface, there is also clustering by filter type.

2.3.1.1 PCoA by depth layer

to better observe effect of filter pore-size

Surface

physeqSurf<- subset_samples(psCLR, Depth == "Surface")
OTUsSurf <- otu_table(physeqSurf)
metaSurf <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsSurf),]
adonis2(vegdist(OTUsSurf, method = "euclidean") ~ filter, data = metaSurf)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = vegdist(OTUsSurf, method = "euclidean") ~ filter, data = metaSurf)
##          Df SumOfSqs      R2      F Pr(>F)
## filter    1     3441 0.04371 1.2799  0.193
## Residual 28    75272 0.95629              
## Total    29    78712 1.00000
ordu = ordinate(physeqSurf, "PCoA", "euclidean")
pSurf_full<-plot_ordination(physeqSurf, ordu, color = "Depth",  shape="filter")+theme_bw() +scale_color_manual(values = c(ap[4]))+ geom_point(size=2) +theme(text = element_text(size=16)) +scale_shape_discrete(labels=c( "0.2", "10.0"))

SCM

physeqSCM<- subset_samples(psCLR, Depth == "SCM")
OTUsSCM <- otu_table(physeqSCM)
metaSCM <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsSCM),]
adonis2(vegdist(OTUsSCM, method = "euclidean") ~ filter, data = metaSCM)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = vegdist(OTUsSCM, method = "euclidean") ~ filter, data = metaSCM)
##          Df SumOfSqs      R2      F Pr(>F)
## filter    1     2911 0.01497 0.7293  0.899
## Residual 48   191560 0.98503              
## Total    49   194471 1.00000
ordu = ordinate(physeqSCM, "PCoA", "euclidean")
pSCM_full<-plot_ordination(physeqSCM, ordu, color = "Depth",  shape="filter")+theme_bw() +scale_color_manual(values = c(ap[5]))+ geom_point(size=2) +theme(text = element_text(size=16)) +scale_shape_discrete(labels=c( "0.2", "10.0"))

Mid

physeqMid<- subset_samples(psCLR, Depth == "Mid")
OTUsMid <- otu_table(physeqMid)
metaMid <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsMid),]
adonis2(vegdist(OTUsMid, method = "euclidean") ~ filter, data = metaMid)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = vegdist(OTUsMid, method = "euclidean") ~ filter, data = metaMid)
##          Df SumOfSqs      R2      F Pr(>F)
## filter    1     2543 0.02227 1.0932  0.244
## Residual 48   111645 0.97773              
## Total    49   114187 1.00000
ordu = ordinate(physeqMid, "PCoA", "euclidean")
pMid_full<-plot_ordination(physeqMid, ordu, color = "Depth",  shape="filter")+theme_bw() +scale_color_manual(values = c(ap[12]))+ geom_point(size=2) +theme(text = element_text(size=16)) +scale_shape_discrete(labels=c( "0.2", "10.0"))

Bottom

physeqBot<- subset_samples(psCLR, Depth == "Bottom")
OTUsBot <- otu_table(physeqBot)
metaBot <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsBot),]
adonis2(vegdist(OTUsBot, method = "euclidean") ~ filter, data = metaBot)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = vegdist(OTUsBot, method = "euclidean") ~ filter, data = metaBot)
##          Df SumOfSqs      R2     F Pr(>F)
## filter    1     2473 0.01732 0.899  0.676
## Residual 51   140322 0.98268             
## Total    52   142796 1.00000
ordu = ordinate(physeqBot, "PCoA", "euclidean")
pBot_full<-plot_ordination(physeqBot, ordu, color = "Depth",  shape="filter")+theme_bw() +scale_color_manual(values = c(ap[8]))+ geom_point(size=2) +theme(text = element_text(size=16)) +scale_shape_discrete(labels=c( "0.2", "10.0"))
supp1<-ggarrange(pSurf_full, pSCM_full, pMid_full, pBot_full, common.legend = TRUE, legend = "none")
supp1

2.3.2 Acantharian Communities

library("CoDaSeq")
psAcan <- subset_taxa(ps, D3 =="Acantharea")
OTU4clr<- data.frame(t(data.frame(otu_table(psAcan))))
row.names(OTU4clr) <- gsub("\\.", "", row.names(OTU4clr))
OTUs.clr <- codaSeq.clr(OTU4clr + 0.5, samples.by.row=TRUE)
OTU2 <- otu_table(as.matrix(OTUs.clr), taxa_are_rows = FALSE)
metatableCLR <-metatable
row.names(metatableCLR) <- gsub("-", "", row.names(metatableCLR))
META2<- sample_data(metatable)
psCLR <- phyloseq(OTU2,TAX,META2)
ordu = ordinate(psCLR, "PCoA", "euclidean")
p_Acanth<-plot_ordination(psCLR, ordu, color="Depth", shape="filter")+theme_bw() +scale_color_manual(values=c(ap[8], ap[12], ap[5], ap[4])) + scale_shape_discrete(labels=c( "0.2", "10.0")) +
geom_point(size=3) +theme(text = element_text(size=16))  + ggtitle("B. Acantharian Communities")
p_Acanth

There is clustering by sample depth: SCM, Surface, and Mid/Bottom.

2.3.2.1 PCoA by depth layer

Surface

physeqSurf<- subset_samples(psCLR, Depth == "Surface")
OTUsSurf <- otu_table(physeqSurf)
metaSurf <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsSurf),]
adonis2(vegdist(OTUsSurf, method = "euclidean") ~ filter, data = metaSurf)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = vegdist(OTUsSurf, method = "euclidean") ~ filter, data = metaSurf)
##          Df SumOfSqs      R2      F Pr(>F)
## filter    1    34.85 0.03285 0.9511  0.475
## Residual 28  1025.93 0.96715              
## Total    29  1060.78 1.00000
ordu = ordinate(physeqSurf, "PCoA", "euclidean")
pSurf_As<-plot_ordination(physeqSurf, ordu, color = "Depth",  shape="filter")+theme_bw() +scale_color_manual(values = c(ap[4]))+ geom_point(size=2) +theme(text = element_text(size=16)) +scale_shape_discrete(labels=c( "0.2", "10.0"))

SCM

physeqSCM<- subset_samples(psCLR, Depth == "SCM")
OTUsSCM <- otu_table(physeqSCM)
metaSCM <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsSCM),]
adonis2(vegdist(OTUsSCM, method = "euclidean") ~ filter, data = metaSCM)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = vegdist(OTUsSCM, method = "euclidean") ~ filter, data = metaSCM)
##          Df SumOfSqs      R2      F Pr(>F)
## filter    1     88.2 0.01092 0.5301  0.993
## Residual 48   7986.6 0.98908              
## Total    49   8074.8 1.00000
ordu = ordinate(physeqSCM, "PCoA", "euclidean")
pSCM_As<-plot_ordination(physeqSCM, ordu, color = "Depth",  shape="filter")+theme_bw() +scale_color_manual(values = c(ap[5]))+ geom_point(size=2) +theme(text = element_text(size=16)) +scale_shape_discrete(labels=c( "0.2", "10.0"))

Mid

physeqMid<- subset_samples(psCLR, Depth == "Mid")
OTUsMid <- otu_table(physeqMid)
metaMid <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsMid),]
adonis2(vegdist(OTUsMid, method = "euclidean") ~ filter, data = metaMid)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = vegdist(OTUsMid, method = "euclidean") ~ filter, data = metaMid)
##          Df SumOfSqs      R2      F Pr(>F)
## filter    1    138.1 0.01857 0.9084  0.502
## Residual 48   7295.9 0.98143              
## Total    49   7434.0 1.00000
ordu = ordinate(physeqMid, "PCoA", "euclidean")
pMid_As<-plot_ordination(physeqMid, ordu, color = "Depth",  shape="filter")+theme_bw() +scale_color_manual(values = c(ap[12]))+ geom_point(size=2) +theme(text = element_text(size=16)) +scale_shape_discrete(labels=c( "0.2", "10.0"))

Bottom

physeqBot<- subset_samples(psCLR, Depth == "Bottom")
OTUsBot <- otu_table(physeqBot)
metaBot <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsBot),]
adonis2(vegdist(OTUsBot, method = "euclidean") ~ filter, data = metaBot)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = vegdist(OTUsBot, method = "euclidean") ~ filter, data = metaBot)
##          Df SumOfSqs      R2     F Pr(>F)
## filter    1    136.9 0.01535 0.795  0.773
## Residual 51   8780.4 0.98465             
## Total    52   8917.2 1.00000
ordu = ordinate(physeqBot, "PCoA", "euclidean")
pBot_As<-plot_ordination(physeqBot, ordu, color = "Depth",  shape="filter")+theme_bw() +scale_color_manual(values = c(ap[8]))+ geom_point(size=2) +theme(text = element_text(size=16)) +scale_shape_discrete(labels=c( "0.2", "10.0"))
supp_As<-ggarrange(pSurf_As, pSCM_As, pMid_As, pBot_As, common.legend = TRUE, legend = "none")
supp_As

supp2<-ggarrange(p_full, p_Acanth, common.legend = TRUE, legend = "bottom")
supp2

2.4 Relative Abundance Plot

Merge Samples –> collapse replicates

Prepare metadata table:

metatable <- read.csv("sampleMap.csv")
row.names(metatable) <- metatable[[1]]
metatable$desc <- paste(metatable$Station, metatable$Depth, metatable$filter, sep="-")
metatable <- metatable[metatable$SampleID %ni% mixedup,]
metatable$Depth <- as.character(metatable$Depth)
metatable$Station <-as.character(metatable$Station)
metatable$filter<-as.character(metatable$filter)
metatable<- metatable[,-which(names(metatable) == "SampleID")]
META<- sample_data(metatable)

ps<- subset_samples(ps, SampleID %ni% mixedup)
ps2<- ps
sample_data(ps2) <- META

Merge:

mergedps <- merge_samples(ps2, "desc")
## Warning in asMethod(object): NAs introduced by coercion

## Warning in asMethod(object): NAs introduced by coercion
print(mergedps)
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 22656 taxa and 112 samples ]
## sample_data() Sample Data:       [ 112 samples by 32 sample variables ]
## tax_table()   Taxonomy Table:    [ 22656 taxa by 8 taxonomic ranks ]

number of samples reduced from 211 to 112

Fix meta table in phyloseq object:

meta2<-as.data.frame(sample_data(mergedps))
split<- do.call(rbind, strsplit(row.names(meta2), "-"))
meta2$Depth <- split[,2]
meta2$filter<-split[,3]
meta2$desc <- row.names(meta2)
meta2$Station <- as.character(meta2$Station)
meta2$Depth_f <- factor(meta2$Depth, levels=c('Surface','SCM','Mid','Bottom'))
meta2$Station_f <- factor(meta2$Station, levels=c("11", "10", "9", "8", "3", "4", "2", "5", "12", "13", "14", "15", "17", "18"))
META <-sample_data(meta2)
sample_data(mergedps)<-META

Create phyloseq object with acantharian ASVs only:

phA<-subset_taxa(mergedps, D3 =="Acantharea") 
phAra<- transform_sample_counts(phA, function(OTU) 100* OTU/sum(OTU))
print(phAra)
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1053 taxa and 112 samples ]
## sample_data() Sample Data:       [ 112 samples by 34 sample variables ]
## tax_table()   Taxonomy Table:    [ 1053 taxa by 8 taxonomic ranks ]

Relative abundance plot:

Dj<- wes_palette("Darjeeling2")
Cv <- wes_palette("Cavalcanti1")
Gb<- wes_palette("GrandBudapest1")
wpcolor<- c(Dj[2], Dj[4], Cv[2], Cv[4], Gb[2], Cv[5])

taxabarplot<-plot_bar(phAra, x= "Station_f", fill = "D4", facet_grid=filter~Depth_f) + scale_y_continuous(expand = c(0, 0)) + ggtitle("") + scale_fill_manual(values=wpcolor) + theme(legend.title=element_blank()) + geom_bar(aes( fill=D4), stat="identity", position="stack") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) +theme_classic() +theme(text = element_text(size=14)) +ylab("Relative Abundance (%)") + xlab ("Station")

t2<-taxabarplot + theme(legend.position="bottom")
t2

#ggsave("/Users/brisbin/Desktop/MiraiDPI/Acantharea_TaxaBarPlot.pdf", width = 12, height =7)

The relative abundance plot shows that the two filters for each sample are similar at all four water depths. This is different than the pattern seen for the full community, where the samples clustered by filter type in the surface and SCM (and are visibly different in relative abundance plots).

The surface samples are dominated by photosymbiotic acantharian sequences (Arth/Symph).

The SCM sees the addition of Acantharea-Group-II and Group VI and slightly more Chaunacanthida.

Mid and Bottom samples are dominated by Group-I, but still has the others as well, including photosymbiotic clades.

Change order of clades in plot and group Acantharea_X with NA

a_ra <- psmelt(phAra)

a_ra$D4<- as.character(a_ra$D4)

a_ra$D4[is.na(a_ra$D4)] <- "Acantharea_X"

wpcolor<- c(Dj[2], Dj[4], Cv[2], Cv[4], Cv[5], Gb[2])

a_ra$D4<- factor(a_ra$D4, levels= c("Acantharea_X", "Acantharea-Group-I", "Acantharea-Group-II" , "Acantharea-Group-VI" , "Chaunacanthida", "Arthracanthida-Symphyacanthida" ))

a_ra$Station <- factor(a_ra$Station, levels=c("11", "10", "9", "8", "3", "4", "2", "5", "12", "13", "14", "15", "17", "18"))

t2<-ggplot(a_ra, aes(x=Station, y=Abundance, fill=D4, )) + geom_bar(stat = "identity") +facet_grid(filter~Depth_f)+ scale_y_continuous(expand = c(0, 0)) +theme_classic() + scale_fill_manual(values=wpcolor) +theme(text = element_text(size=14)) +ylab("Relative Abundance (%)") + xlab ("Station")

t2<-t2 + theme(legend.position="bottom")
t2

#ggsave("/Users/brisbin/Desktop/MiraiDPI/Acantharea_TaxaBarPlot.pdf", width = 11, height =6)

2.5 ASV ordination plots

GP.ord <- ordinate(phAra, "NMDS", "bray")
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1188792 
## Run 1 stress 0.1236332 
## Run 2 stress 0.1239209 
## Run 3 stress 0.1168659 
## ... New best solution
## ... Procrustes: rmse 0.01053879  max resid 0.109474 
## Run 4 stress 0.1234519 
## Run 5 stress 0.1257378 
## Run 6 stress 0.1270151 
## Run 7 stress 0.1168659 
## ... Procrustes: rmse 3.926368e-05  max resid 0.0002581404 
## ... Similar to previous best
## Run 8 stress 0.1397746 
## Run 9 stress 0.1239147 
## Run 10 stress 0.1168515 
## ... New best solution
## ... Procrustes: rmse 0.0009280845  max resid 0.008860427 
## ... Similar to previous best
## Run 11 stress 0.1303785 
## Run 12 stress 0.1281913 
## Run 13 stress 0.127015 
## Run 14 stress 0.1257307 
## Run 15 stress 0.127015 
## Run 16 stress 0.1267157 
## Run 17 stress 0.1189014 
## Run 18 stress 0.1310686 
## Run 19 stress 0.139774 
## Run 20 stress 0.13038 
## *** Solution reached
p4 = plot_ordination(phAra, GP.ord, type="split", color="Depth", shape="filter", label="Station", title="split") 
p4

When only Acantharian ASVs are included in the ordination, the separation between SCM and surface is more clear. The bottom and mid water samples also separate more than in the full communities.

p1 = plot_ordination(phAra, GP.ord, type="taxa", color="D4", title="taxa") +scale_color_manual(values=colors)
p1 + facet_wrap(~D4, 3)

Acantharea-group-1 is the only group with a clear pattern: primarily present in deep/mid cluster and completely absent in the surface.

2.6 Station relative abundance scatter plots (for comparison with imaging results)

2.6.1 All Acantharians

unfiltered samples - percent acantharean in each 10 um filter - plot as points by depth

phyloseq<-qza_to_phyloseq(features="~/desktop/MiraiFilters/mergedtable.qza")
## Warning in read_qza(features): Artifact was not generated with Qiime2 2018.4,
## if data is not successfully imported, please report here github.com/jbisanz/
## qiime2R/issues
metatable <- read.csv("~/desktop/MiraiFilters/sampleMap.csv")
row.names(metatable) <- metatable[[1]]

metatable$Station <- as.character(metatable$Station)
metatable$filter <- as.character(metatable$filter)

META_all<- sample_data(metatable)

ps = merge_phyloseq(phyloseq, TAX, META_all)

ps<- subset_samples(ps, SampleID %ni% mixedup)

psD3 = tax_glom(ps, "D3")
psD3ra <- transform_sample_counts(psD3, function(OTU) 100* OTU/sum(OTU))
psD3ra_Aonly<- subset_taxa(psD3ra, D3 =="Acantharea") 
psD3ra_Aonly_10<-subset_samples(psD3ra_Aonly, filter == "10")
psD3ra_Aonly_10_otus <-data.frame(t(data.frame(otu_table(psD3ra_Aonly_10))))
names(psD3ra_Aonly_10_otus) <- c("acanth")
metatable_percent_plot <- metatable
row.names(metatable_percent_plot) <- gsub("-", ".", row.names(metatable_percent_plot))
psD3ra_Aonly_10_otus_meta <- merge(psD3ra_Aonly_10_otus, metatable_percent_plot, by = "row.names")
percent_depth <- ggplot(psD3ra_Aonly_10_otus_meta, aes(x=m, y = acanth)) + geom_point() + theme_bw() +theme(text = element_text(size=12)) + xlab("Depth (m)") +ylab("% Acantharian Reads") +ggtitle("A.") +
  geom_smooth(method=lm, color = "#11638C")
percent_DPIstations<- psD3ra_Aonly_10_otus_meta[psD3ra_Aonly_10_otus_meta$Station %in% c("10", "3", "15", "17"),]

percent_DPIstations$Station <- factor(percent_DPIstations$Station, levels = c("10", "3", "15", "17"))

stat.labs <- c("st. 10", "st. 3", "st. 15", "st. 17" )
names(stat.labs) <- c("10", "3", "15", "17")

percent_depth_all_faceted <- ggplot(percent_DPIstations, aes(x=m, y = acanth)) + geom_point() + theme_bw() +theme(text = element_text(size=12)) + xlab("Depth (m)") +ylab("% Acantharian Reads") +ggtitle("A.") +
  geom_smooth(method=lm, color = "#11638C") + facet_grid( ~Station,labeller =  labeller(Station = stat.labs)) +ggtitle("B.")+ theme(strip.background = element_rect(colour='black', fill='white')) + theme(axis.text.x = element_text(angle = 315))
ggarrange(percent_depth, percent_depth_all_faceted, nrow =2 )

#ggsave("/Users/brisbin/Desktop/MiraiDPI/percentabundance_lms.png")
depthLM <- lm(acanth ~ m, data = psD3ra_Aonly_10_otus_meta)
summary(depthLM)
## 
## Call:
## lm(formula = acanth ~ m, data = psD3ra_Aonly_10_otus_meta)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.9495 -1.3189 -0.5304  0.8246 10.8352 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.9455573  0.3096055   9.514 6.97e-16 ***
## m           0.0021050  0.0003526   5.970 3.20e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.46 on 106 degrees of freedom
## Multiple R-squared:  0.2516, Adjusted R-squared:  0.2446 
## F-statistic: 35.64 on 1 and 106 DF,  p-value: 3.199e-08

Percent contribution of acantharians to the full community is significantly positively correlated to depth (increased % with increased depth).

By station: station 10

depthLM10 <- lm(acanth ~ m, data = psD3ra_Aonly_10_otus_meta[psD3ra_Aonly_10_otus_meta$Station == "10",])
summary(depthLM10)
## 
## Call:
## lm(formula = acanth ~ m, data = psD3ra_Aonly_10_otus_meta[psD3ra_Aonly_10_otus_meta$Station == 
##     "10", ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0785 -1.4908 -0.6387  0.9659  3.3003 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 3.129084   1.076398   2.907   0.0271 *
## m           0.001238   0.001292   0.958   0.3751  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.208 on 6 degrees of freedom
## Multiple R-squared:  0.1327, Adjusted R-squared:  -0.01188 
## F-statistic: 0.9178 on 1 and 6 DF,  p-value: 0.3751

station 15

depthLM15 <- lm(acanth ~ m, data = psD3ra_Aonly_10_otus_meta[psD3ra_Aonly_10_otus_meta$Station == "15",])
summary(depthLM15)
## 
## Call:
## lm(formula = acanth ~ m, data = psD3ra_Aonly_10_otus_meta[psD3ra_Aonly_10_otus_meta$Station == 
##     "15", ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4736 -1.7452 -0.5944  1.1665  4.6814 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 4.256268   1.327889   3.205   0.0185 *
## m           0.001824   0.002538   0.719   0.4993  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.54 on 6 degrees of freedom
## Multiple R-squared:  0.07928,    Adjusted R-squared:  -0.07417 
## F-statistic: 0.5167 on 1 and 6 DF,  p-value: 0.4993

station 17

depthLM17 <- lm(acanth ~ m, data = psD3ra_Aonly_10_otus_meta[psD3ra_Aonly_10_otus_meta$Station == "17",])
summary(depthLM17)
## 
## Call:
## lm(formula = acanth ~ m, data = psD3ra_Aonly_10_otus_meta[psD3ra_Aonly_10_otus_meta$Station == 
##     "17", ])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.26809 -0.17481  0.04584  0.21244  1.16932 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 2.3476085  0.4014060   5.848  0.00110 **
## m           0.0033857  0.0007679   4.409  0.00452 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7498 on 6 degrees of freedom
## Multiple R-squared:  0.7641, Adjusted R-squared:  0.7248 
## F-statistic: 19.44 on 1 and 6 DF,  p-value: 0.004525

station 3

depthLM3 <- lm(acanth ~ m, data = psD3ra_Aonly_10_otus_meta[psD3ra_Aonly_10_otus_meta$Station == "3",])
summary(depthLM3)
## 
## Call:
## lm(formula = acanth ~ m, data = psD3ra_Aonly_10_otus_meta[psD3ra_Aonly_10_otus_meta$Station == 
##     "3", ])
## 
## Residuals:
##       78       79       80       81       82 
##  0.08525 -1.74554 -0.20242  0.35717  1.50555 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  2.072e+00  7.569e-01   2.738   0.0715 .
## m           -1.395e-05  6.746e-04  -0.021   0.9848  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.353 on 3 degrees of freedom
## Multiple R-squared:  0.0001425,  Adjusted R-squared:  -0.3331 
## F-statistic: 0.0004275 on 1 and 3 DF,  p-value: 0.9848

Individual station linear model is only significant for station 17.

2.6.2 Only photosymbiotic acantharian clades

psD4 = tax_glom(ps, "D4")
psD4ra <- transform_sample_counts(psD4, function(OTU) 100* OTU/sum(OTU))
psD4ra_Aonly<- subset_taxa(psD4ra, D4 %in% c("Arthracanthida-Symphyacanthida", "Chaunacanthida")) 
psD4ra_Aonly_10<-subset_samples(psD4ra_Aonly, filter == "10")
psD4ra_Aonly_10_otus <-data.frame(t(data.frame(otu_table(psD4ra_Aonly_10))))
psD4ra_Aonly_10_otus$acanth <- rowSums(psD4ra_Aonly_10_otus)
metatable_percent_plot <- metatable
row.names(metatable_percent_plot) <- gsub("-", ".", row.names(metatable_percent_plot))
psD4ra_Aonly_10_otus_meta <- merge(psD4ra_Aonly_10_otus, metatable_percent_plot, by = "row.names")
percent_depth <- ggplot(psD4ra_Aonly_10_otus_meta, aes(x=m, y = acanth)) + geom_point() + theme_bw() +theme(text = element_text(size=12)) + xlab("Depth (m)") +ylab("% Arthracanthida-Symphyacanthida-Chaunacanthida") +ggtitle("A.") +
  geom_smooth(method=lm, color = Cv[5]) +theme( axis.title.y = element_text(size = 10))
percent_DPIstations<- psD4ra_Aonly_10_otus_meta[psD3ra_Aonly_10_otus_meta$Station %in% c("10", "3", "15", "17"),]

percent_DPIstations$Station <- factor(percent_DPIstations$Station, levels = c("10", "3", "15", "17"))

stat.labs <- c("st. 10", "st. 3", "st. 15", "st. 17" )
names(stat.labs) <- c("10", "3", "15", "17")

percent_depth_all_faceted <- ggplot(percent_DPIstations, aes(x=m, y = acanth)) + geom_point() + theme_bw() +theme(text = element_text(size=12)) + xlab("Depth (m)") +ylab("% Arthracanthida-Symphyacanthida-Chaunacanthida") +
  geom_smooth(method=lm, color = Cv[5]) + facet_grid( ~Station,labeller =  labeller(Station = stat.labs)) +ggtitle("B.")+ theme(strip.background = element_rect(colour='black', fill='white')) + theme(axis.text.x = element_text(angle = 315), axis.title.y = element_text(size = 10))
ggarrange(percent_depth, percent_depth_all_faceted, nrow =2 )

#ggsave("/Users/brisbin/Desktop/MiraiDPI/percentabundance_ASC_lms.png")
depthLM <- lm(acanth ~ m, data = psD4ra_Aonly_10_otus_meta)
summary(depthLM)
## 
## Call:
## lm(formula = acanth ~ m, data = psD4ra_Aonly_10_otus_meta)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6689 -0.7444 -0.3675  0.4372  6.3853 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.9937184  0.1520360  13.113  < 2e-16 ***
## m           -0.0008280  0.0001732  -4.782 5.64e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.208 on 106 degrees of freedom
## Multiple R-squared:  0.1774, Adjusted R-squared:  0.1697 
## F-statistic: 22.86 on 1 and 106 DF,  p-value: 5.636e-06

Percent contribution of photosymbiotic acantharians to the full community is significantly negatively correlated to depth (decreased % with increased depth).

By station: station 10

depthLM10 <- lm(acanth ~ m, data = psD4ra_Aonly_10_otus_meta[psD4ra_Aonly_10_otus_meta$Station == "10",])
summary(depthLM10)
## 
## Call:
## lm(formula = acanth ~ m, data = psD4ra_Aonly_10_otus_meta[psD4ra_Aonly_10_otus_meta$Station == 
##     "10", ])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.63250 -0.20514  0.03464  0.14869  0.66653 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.7165168  0.2147842   7.992 0.000205 ***
## m           -0.0009643  0.0002578  -3.741 0.009608 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4405 on 6 degrees of freedom
## Multiple R-squared:    0.7,  Adjusted R-squared:  0.6499 
## F-statistic:    14 on 1 and 6 DF,  p-value: 0.009608

station 15

depthLM15 <- lm(acanth ~ m, data = psD4ra_Aonly_10_otus_meta[psD4ra_Aonly_10_otus_meta$Station == "15",])
summary(depthLM15)
## 
## Call:
## lm(formula = acanth ~ m, data = psD4ra_Aonly_10_otus_meta[psD4ra_Aonly_10_otus_meta$Station == 
##     "15", ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.2144 -0.8647 -0.2261  0.0278  4.4998 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  4.138961   1.112769   3.720  0.00986 **
## m           -0.004704   0.002127  -2.212  0.06898 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.128 on 6 degrees of freedom
## Multiple R-squared:  0.4491, Adjusted R-squared:  0.3573 
## F-statistic: 4.892 on 1 and 6 DF,  p-value: 0.06898

station 17

depthLM17 <- lm(acanth ~ m, data = psD4ra_Aonly_10_otus_meta[psD4ra_Aonly_10_otus_meta$Station == "17",])
summary(depthLM17)
## 
## Call:
## lm(formula = acanth ~ m, data = psD4ra_Aonly_10_otus_meta[psD4ra_Aonly_10_otus_meta$Station == 
##     "17", ])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.10954 -0.35440  0.00388  0.28428  1.31145 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  2.288311   0.421833   5.425  0.00163 **
## m           -0.001843   0.000807  -2.284  0.06249 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.788 on 6 degrees of freedom
## Multiple R-squared:  0.465,  Adjusted R-squared:  0.3758 
## F-statistic: 5.215 on 1 and 6 DF,  p-value: 0.06249

station 3

depthLM3 <- lm(acanth ~ m, data = psD4ra_Aonly_10_otus_meta[psD4ra_Aonly_10_otus_meta$Station == "3",])
summary(depthLM3)
## 
## Call:
## lm(formula = acanth ~ m, data = psD4ra_Aonly_10_otus_meta[psD4ra_Aonly_10_otus_meta$Station == 
##     "3", ])
## 
## Residuals:
##      78      79      80      81      82 
##  0.3811 -1.3858  0.1889 -0.5314  1.3473 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  1.7106609  0.6619871   2.584   0.0815 .
## m           -0.0007982  0.0005900  -1.353   0.2690  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.183 on 3 degrees of freedom
## Multiple R-squared:  0.379,  Adjusted R-squared:  0.172 
## F-statistic: 1.831 on 1 and 3 DF,  p-value: 0.269

Individual station linear model is only significant for station 10.

3 High-throughput imaging analysis

Acantharian ROIs and full raw images are archived on Zenodo (doi: 10.5281/zenodo.3605400).

Load additional packages:

library(readr)
library(reshape2)
library("plyr", lib.loc="/Library/Frameworks/R.framework/Versions/3.4/Resources/library")

3.1 Acantharian abundance by depth

Station 17 Load and format CTD data:

ctd17 <- read.csv("~/Desktop/MiraiDPI/DPI_CTD_data/6KCDT0006/CTD/6KCDT0006_CTD_data_1sec_int.csv")
ctd17 <- ctd17[,1:4]
names(ctd17) <- c("Time", "Temp", "Salinity", "Depth")
ctd17$Depth <- ctd17$Depth * -1
ctd17$station <- 17
ctd17$Time <- gsub(":", "", ctd17$Time)
ctd17$Time2 <- paste0('201706090', ctd17$Time)
st17aCTD <- ctd17[,c(4,6)]
names(st17aCTD)<-c("Depth","photos")
str(st17aCTD)
## 'data.frame':    9576 obs. of  2 variables:
##  $ Depth : num  -12.6 -12.9 -13.1 -13.3 -13.3 ...
##  $ photos: chr  "20170609085210" "20170609085211" "20170609085212" "20170609085213" ...

Load and format image data:

st17as<- read.table("~/Desktop/MiraiDPI/st17DPI/st17rois.txt", header = FALSE)
names(st17as) <- c("photos")
st17as$photos <- strtrim(st17as$photos, 14)
st17as$photos <- as.numeric(st17as$photos)
## Warning: NAs introduced by coercion

Add depth from CTD data to images:

st17acanth <- merge(st17aCTD,st17as, by = "photos" )

Calculate the sampling volume (in order to get cells/L): Load names of all images from station 17 downcast:

st17pics<- read.table("~/desktop/MiraiDPI/st17DPI/st17ALLphotos.txt", header = TRUE)

st17pics$photos <- strtrim(st17pics$photos, 14)
st17pics$photos <- as.numeric(st17pics$photos)
## Warning: NAs introduced by coercion
st17pics <- merge(st17aCTD,st17pics, by = "photos" )

Get dataframe of the number of photos by depth at 10m intervals:

br = seq( -800, 0, by = 10)
ranges = br[-1]
freq   = hist(st17pics$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)

P<- data.frame(depth = ranges, photofrequency = freq$counts)
str(P)
## 'data.frame':    80 obs. of  2 variables:
##  $ depth         : num  -790 -780 -770 -760 -750 -740 -730 -720 -710 -700 ...
##  $ photofrequency: int  0 0 215 26 26 25 26 26 26 7 ...

How many photos total?

sum(P$photofrequency)
## [1] 2453

Get same data frame for Acantharian frequency:

br = seq( -800, 0, by = 10)
ranges = br[-1]
freq   = hist(st17acanth$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)

A<- data.frame(depth = ranges, Acanthfrequency = freq$counts)
str(A)
## 'data.frame':    80 obs. of  2 variables:
##  $ depth          : num  -790 -780 -770 -760 -750 -740 -730 -720 -710 -700 ...
##  $ Acanthfrequency: int  0 0 2 0 0 0 0 0 0 0 ...

How many acanths total?

sum(A$Acanthfrequency)
## [1] 237

Divide the acantharian frequency by photo frequency and convert to cells/L.

For stations 3 and 10 the camera and led were 15.4cm apart. For stations 15 and 17, the camera and LED were 14 cm apart.

5.5 x 4.6 x 15.4 = 389.62 cubic cm = 0.38962 L at stations 3 & 10 5.5 x 4.6 x 14 = 354.2 cubic cm = 0.3542 L at stations 15 & 17

station 17 volume calculation:

frequencyDF17 <- merge(A, P, by = "depth")[-1,]
frequencyDF17$conc <- frequencyDF17$Acanthfrequency / frequencyDF17$photofrequency
frequencyDF17$perLiter <- frequencyDF17$conc / 0.3542
frequencyDF17$Station <- "st.17"
frequencyDF17$pos_depth <- frequencyDF17$depth * -1

Station 3

Load and format CTD data:

ctd3 <- read.csv("~/Desktop/MiraiDPI/DPI_CTD_data/6KCDT0003/CTD/6KCDT0003_CTD_data_1sec_int.csv")
ctd3 <- ctd3[,1:4]
names(ctd3) <- c("Time", "Temp", "Salinity", "Depth")
ctd3$Depth <- ctd3$Depth * -1
ctd3$station <- 3
ctd3$Time <- gsub(":", "", ctd3$Time)
ctd3$Time2 <- paste0('201705310', ctd3$Time)
st3aCTD <- ctd3[,c(4,6)]
names(st3aCTD)<-c("Depth","photos")
str(st3aCTD)
## 'data.frame':    6584 obs. of  2 variables:
##  $ Depth : num  -7.39 -7.28 -7.21 -7.22 -7.26 ...
##  $ photos: chr  "20170531074319" "20170531074320" "20170531074321" "20170531074322" ...

Load and format image data:

st3as<- read.table("~/Desktop/MiraiDPI/st3DPI/st3rois.txt", header = FALSE)
names(st3as) <- c("photos")
st3as$photos <- strtrim(st3as$photos, 14)
st3as$photos <- as.numeric(st3as$photos)
## Warning: NAs introduced by coercion
st3acanth <- merge(st3aCTD,st3as, by = "photos" )

Calculate the sampling volume (in order to get cells/L): Load names of all images from station 3 downcast:

st3pics<- read.table("~/desktop/MiraiDPI/st3DPI/st3ALLphotos.txt", header = TRUE)
st3pics$photos <- strtrim(st3pics$photos, 14)
st3pics$photos <- as.numeric(st3pics$photos)
st3pics <- merge(st3aCTD,st3pics, by = "photos" )
st3pics <- st3pics[,-3]
names(st3pics)<- c("photos", "Depth")
st3pics$photos <- as.character(st3pics$photos)

Get data frame of image frequency by depth in 10m intervals:

br = seq( -1020, 0, by = 10)
ranges = br[-1]
freq   = hist(st3pics$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)

P3<- data.frame(depth = ranges, photofrequency = freq$counts)
str(P3)
## 'data.frame':    102 obs. of  2 variables:
##  $ depth         : num  -1010 -1000 -990 -980 -970 -960 -950 -940 -930 -920 ...
##  $ photofrequency: int  0 109 38 39 40 24 33 38 39 38 ...

How many photos total?

sum(P3$photofrequency)
## [1] 4010

Get same data frame for Acantharian frequency:

br = seq( -1020, 0, by = 10)
ranges = br[-1]
freq   = hist(st3acanth$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)

A3<- data.frame(depth = ranges, Acanthfrequency = freq$counts)
str(A3)
## 'data.frame':    102 obs. of  2 variables:
##  $ depth          : num  -1010 -1000 -990 -980 -970 -960 -950 -940 -930 -920 ...
##  $ Acanthfrequency: int  0 0 0 0 0 0 0 0 0 0 ...

How many acanths total?

sum(A3$Acanthfrequency)
## [1] 115

Divide the acantharian frequency by photo frequency and convert to cells/L. And, plot cells/L by depth:

frequencyDF3 <- merge(A3, P3, by = "depth")[-1,]
frequencyDF3$conc <- frequencyDF3$Acanthfrequency / frequencyDF3$photofrequency
frequencyDF3$perLiter <- frequencyDF3$conc / 0.38962
frequencyDF3$pos_depth <- frequencyDF3$depth * -1
frequencyDF3$Station <- "st.3"

Station 10 Load and format CTD data:

ctd10 <- read.csv("~/Desktop/MiraiDPI/DPI_CTD_data/6KCDT0004/CTD/6KCDT0004_CTD_data_1sec_int.csv")
ctd10 <- ctd10[,1:4]
names(ctd10) <- c("Time", "Temp", "Salinity", "Depth")
ctd10$Depth <- ctd10$Depth * -1
ctd10$station <- 10
ctd10$Time <- gsub(":", "", ctd10$Time)
ctd10$Time2 <- paste0('201706020', ctd10$Time)
st10aCTD <- ctd10[,c(4,6)]
names(st10aCTD)<-c("Depth","photos")

Load and format image data:

st10aNames<- c("photos")
st10as<- read.table("~/Desktop/MiraiDPI/st10DPI/st10rois.txt", col.names = st10aNames)
st10as$photos <- strtrim(st10as$photos, 14)
st10acanth <- merge(st10aCTD,st10as, by = "photos" )

Calculate the sampling volume (in order to get cells/L): Load names of all images from station 3 downcast:

st10pics<- read.table("~/desktop/MiraiDPI/st10DPI/st10ALLphotos.txt", col.names = st10aNames)
st10pics$photos <- strtrim(st10pics$photos, 14)
st10pics$photos <- as.numeric(st10pics$photos)
st10pics <- merge(st10aCTD,st10pics, by = "photos" )
st10pics <- st10pics[,-3]
names(st10pics)<- c("photos", "Depth")
st10pics$photos <- as.character(st10pics$photos)

Get data frame of image frequency by depth in 10m intervals:

br = seq( -1020, 0, by = 10)
ranges = br[-1]
freq   = hist(st10pics$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)

P10<- data.frame(depth = ranges, photofrequency = freq$counts)
str(P10)
## 'data.frame':    102 obs. of  2 variables:
##  $ depth         : num  -1010 -1000 -990 -980 -970 -960 -950 -940 -930 -920 ...
##  $ photofrequency: int  0 6 51 33 32 33 32 32 33 33 ...

How many photos total?

sum(P10$photofrequency)
## [1] 3639

Get same data frame for Acantharian frequency:

br = seq( -1020, 0, by = 10)
ranges = br[-1]
freq   = hist(st10acanth$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)

A10<- data.frame(depth = ranges, Acanthfrequency = freq$counts)
str(A10)
## 'data.frame':    102 obs. of  2 variables:
##  $ depth          : num  -1010 -1000 -990 -980 -970 -960 -950 -940 -930 -920 ...
##  $ Acanthfrequency: int  0 0 1 0 0 0 0 0 0 4 ...
sum(A10$Acanthfrequency)
## [1] 359

Divide the acantharian frequency by photo frequency and convert to cells/L. And, plot cells/L by depth:

frequencyDF10 <- merge(A10, P10, by = "depth")[-1,]
frequencyDF10$conc <- frequencyDF10$Acanthfrequency / frequencyDF10$photofrequency
frequencyDF10$perLiter <- frequencyDF10$conc / 0.38962
frequencyDF10$Station <- "st.10"
frequencyDF10$pos_depth <- frequencyDF10$depth * -1

Station 15

Load and format CTD data:

ctd15 <- read.csv("~/Desktop/MiraiDPI/DPI_CTD_data/6KCDT0005/CTD/6KCDT0005_CTD_data_1sec_int.csv")
ctd15 <- ctd15[,1:4]
names(ctd15) <- c("Time", "Temp", "Salinity", "Depth")
ctd15$Depth <- ctd15$Depth * -1
ctd15$station <- 10
ctd15$Time <- gsub(":", "", ctd15$Time)

ctd15$Time2 <- paste0('20170606', ctd15$Time)
st15aCTD <- ctd15[,c(4,6)]
names(st15aCTD)<-c("Depth","photos")

st15as<- read.table("~/Desktop/MiraiDPI/st15DPI/st15rois.txt", col.names = st10aNames)
st15as$photos <- strtrim(st15as$photos, 14)

st15acanth <- merge(st15aCTD,st15as, by = "photos" )

Calculate the sampling volume (in order to get cells/L): Load names of all images from station 3 downcast:

st15pics<- read.table("~/desktop/MiraiDPI/st15DPI/st15ALLphotos.txt", col.names = st10aNames)
st15pics$photos <- strtrim(st15pics$photos, 14)
st15pics$photos <- as.numeric(st15pics$photos)
## Warning: NAs introduced by coercion
st15pics <- merge(st15aCTD,st15pics, by = "photos" )
st15pics <- st15pics[,-3]
names(st15pics)<- c("photos", "Depth")
st15pics$photos <- as.character(st15pics$photos)

Get data frame of image frequency by depth in 10m intervals:

br = seq( -780, 0, by = 10)
ranges = br[-1]
freq   = hist(st15pics$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)

P15<- data.frame(depth = ranges, photofrequency = freq$counts)
str(P15)
## 'data.frame':    78 obs. of  2 variables:
##  $ depth         : num  -770 -760 -750 -740 -730 -720 -710 -700 -690 -680 ...
##  $ photofrequency: int  38 50 49 45 42 39 42 41 39 37 ...

How many photos total?

sum(P15$photofrequency)
## [1] 3056

Get same data frame for Acantharian frequency:

br = seq( -780, 0, by = 10)
ranges = br[-1]
freq   = hist(st15acanth$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)

A15<- data.frame(depth = ranges, Acanthfrequency = freq$counts)
str(A15)
## 'data.frame':    78 obs. of  2 variables:
##  $ depth          : num  -770 -760 -750 -740 -730 -720 -710 -700 -690 -680 ...
##  $ Acanthfrequency: int  1 1 0 3 2 4 2 1 5 3 ...
sum(A15$Acanthfrequency)
## [1] 524

Divide the acantharian frequency by photo frequency and convert to cells/L. And, plot cells/L by depth:

frequencyDF15 <- merge(A15, P15, by = "depth")[-1,]
frequencyDF15$conc <- frequencyDF15$Acanthfrequency / frequencyDF15$photofrequency
frequencyDF15$perLiter <- frequencyDF15$conc / 0.3542
frequencyDF15$pos_depth <- frequencyDF15$depth * -1
frequencyDF15$Station <- "st.15"

rbind and plot:

frequencyDF_all<- rbind(frequencyDF10, frequencyDF15, frequencyDF17, frequencyDF3)

frequencyDF_all$Station <- factor(frequencyDF_all$Station, levels = c("st.10", "st.3", "st.15", "st.17"))

conc_depth_all <- ggplot(frequencyDF_all, aes(x=pos_depth, y = perLiter)) + geom_point() + theme_bw() +theme(text = element_text(size=12)) + xlab("Depth (m)") +ylab("Acantharian cells per Liter") +ggtitle("A.") + geom_smooth(method=lm, formula= (log(y+1) ~ log(x+1)), color = "#0E899F")

conc_depth_all_faceted <- conc_depth_all + facet_grid(~Station) +ggtitle("B.") + theme(strip.background = element_rect(colour='black', fill='white'))

ggarrange(conc_depth_all, conc_depth_all_faceted, nrow =2 )

#ggsave("~/desktop/MiraiDPI/conc_by_depth.png")

maximum concentration and depth at each station:

frequencyDF_all %>% dplyr::select(perLiter, depth, Station) %>% group_by(Station) %>% top_n(n=1, wt = perLiter) %>% arrange(Station)
## # A tibble: 4 x 3
## # Groups:   Station [4]
##   perLiter depth Station
##      <dbl> <dbl> <fct>  
## 1    1.81    -50 st.10  
## 2    0.921   -30 st.3   
## 3    4.71      0 st.15  
## 4    2.82      0 st.17

Linear model for all 4 stations:

depth_conc_LM <- lm(log(perLiter+1) ~ log(pos_depth+1), data = frequencyDF_all)
summary(depth_conc_LM)
## 
## Call:
## lm(formula = log(perLiter + 1) ~ log(pos_depth + 1), data = frequencyDF_all)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.68182 -0.09095 -0.00278  0.06052  0.82563 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         1.17752    0.04621   25.48   <2e-16 ***
## log(pos_depth + 1) -0.18197    0.00786  -23.15   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1568 on 354 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.6023, Adjusted R-squared:  0.6011 
## F-statistic:   536 on 1 and 354 DF,  p-value: < 2.2e-16

station 10

depth_conc_LM10 <- lm(log(perLiter+1) ~ log(pos_depth+1), data = frequencyDF10) 
summary(depth_conc_LM10)
## 
## Call:
## lm(formula = log(perLiter + 1) ~ log(pos_depth + 1), data = frequencyDF10)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.30326 -0.05565 -0.00232  0.04542  0.54603 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         1.20114    0.08181   14.68   <2e-16 ***
## log(pos_depth + 1) -0.18143    0.01360  -13.34   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1244 on 98 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.6449, Adjusted R-squared:  0.6412 
## F-statistic: 177.9 on 1 and 98 DF,  p-value: < 2.2e-16

station 15

depth_conc_LM15 <- lm(log(perLiter+1) ~ log(pos_depth+1), data = frequencyDF15)
summary(depth_conc_LM15)
## 
## Call:
## lm(formula = log(perLiter + 1) ~ log(pos_depth + 1), data = frequencyDF15)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.35507 -0.17145 -0.02763  0.13900  0.58219 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         1.70185    0.12273   13.87   <2e-16 ***
## log(pos_depth + 1) -0.26377    0.02148  -12.28   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2073 on 75 degrees of freedom
## Multiple R-squared:  0.6678, Adjusted R-squared:  0.6634 
## F-statistic: 150.8 on 1 and 75 DF,  p-value: < 2.2e-16

station 17

depth_conc_LM17 <- lm(log(perLiter+1) ~ log(pos_depth+1), data = frequencyDF17)
summary(depth_conc_LM17)
## 
## Call:
## lm(formula = log(perLiter + 1) ~ log(pos_depth + 1), data = frequencyDF17)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.29928 -0.07205  0.01927  0.07570  0.27643 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         1.24666    0.07014   17.77   <2e-16 ***
## log(pos_depth + 1) -0.20116    0.01225  -16.42   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1189 on 76 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.7801, Adjusted R-squared:  0.7773 
## F-statistic: 269.7 on 1 and 76 DF,  p-value: < 2.2e-16

station 3

depth_conc_LM3 <- lm(log(perLiter+1) ~ log(pos_depth+1), data = frequencyDF3)
summary(depth_conc_LM3)
## 
## Call:
## lm(formula = log(perLiter + 1) ~ log(pos_depth + 1), data = frequencyDF3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.15257 -0.05045  0.00048  0.03226  0.35040 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         0.642812   0.045212   14.22   <2e-16 ***
## log(pos_depth + 1) -0.099063   0.007554  -13.11   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08229 on 99 degrees of freedom
## Multiple R-squared:  0.6347, Adjusted R-squared:  0.631 
## F-statistic:   172 on 1 and 99 DF,  p-value: < 2.2e-16

3.2 correlation between image conc and sequence % abundance

frequencyDF_surf <- frequencyDF_all%>% group_by(Station) %>% dplyr::select(pos_depth, perLiter) %>% filter(pos_depth < 50) %>% filter(!is.na(perLiter)) %>% summarize(Conc= mean(perLiter))
## Adding missing grouping variables: `Station`
frequencyDF_surf$Depth<- "Surface"
frequencyDF_SCM <- frequencyDF_all%>% group_by(Station) %>% dplyr::select(pos_depth, perLiter) %>% filter(pos_depth > 50 & pos_depth < 150) %>% filter(!is.na(perLiter)) %>% summarize(Conc= mean(perLiter))
## Adding missing grouping variables: `Station`
frequencyDF_SCM$Depth <-"SCM"
frequencyDF_Mid <- frequencyDF_all%>% group_by(Station) %>% dplyr::select(pos_depth, perLiter) %>% filter(pos_depth > 150 & pos_depth < 700) %>% filter(!is.na(perLiter)) %>% summarize(Conc= mean(perLiter))
## Adding missing grouping variables: `Station`
frequencyDF_Mid$Depth <- "Mid"
frequencyDF_Bot<- frequencyDF_all%>% group_by(Station) %>% dplyr::select(pos_depth, perLiter) %>% filter(pos_depth > 700) %>% filter(!is.na(perLiter)) %>% summarize(Conc= mean(perLiter))
## Adding missing grouping variables: `Station`
frequencyDF_Bot$Depth <- "Bottom"
frequencyBYlayer<-rbind(frequencyDF_surf, frequencyDF_SCM, frequencyDF_Mid, frequencyDF_Bot)

mergedps merged replicates so there is one sample per combination of variables

psD4 = tax_glom(mergedps, "D4")
psD4ra <- transform_sample_counts(psD4, function(OTU) 100* OTU/sum(OTU))
psD4ra_Aonly<- subset_taxa(psD4ra, D4 %in% c("Arthracanthida-Symphyacanthida", "Chaunacanthida")) 
psD4ra_Aonly_10<-subset_samples(psD4ra_Aonly, filter == "10")
psD4ra_Aonly_10_otus <-data.frame(otu_table(psD4ra_Aonly_10))
psD4ra_Aonly_10_otus$acanth <- rowSums(psD4ra_Aonly_10_otus)


metatable <- read.csv("sampleMap.csv")
row.names(metatable) <- metatable[[1]]
metatable$desc <- paste(metatable$Station, metatable$Depth, metatable$filter, sep="-")
metatable <- metatable[metatable$SampleID %ni% mixedup,]
metatable$Depth <- as.character(metatable$Depth)
metatable$Station <-as.character(metatable$Station)
metatable$filter<-as.character(metatable$filter)
metatable<- metatable[,-which(names(metatable) == "SampleID")]


metatable_percent_plot <- metatable %>% distinct(desc, .keep_all = TRUE) %>% filter(filter== "10")

row.names(metatable_percent_plot) <- metatable_percent_plot$desc
psD4ra_Aonly_10_otus_meta <- merge(psD4ra_Aonly_10_otus, metatable_percent_plot, by = "row.names")

DPIstations <- psD4ra_Aonly_10_otus_meta %>% dplyr::select(acanth, Station, Depth)

DPIstations$Station <- paste0('st.', DPIstations$Station)

final <- merge(DPIstations, frequencyBYlayer)

plot count by percent

countBYperc <- ggplot(final, aes(x=Conc, y = acanth)) + geom_point() + theme_bw() +theme(text = element_text(size=12)) + xlab("cells per L") +ylab("% Arthracanthida-Symphyacanthida-Chaunacanthida") +
  geom_smooth(method=lm, color = Cv[5]) +theme( axis.title.y = element_text(size = 10))
countBYperc

cBYp <- lm(acanth ~ Conc, data = final)
summary(cBYp)
## 
## Call:
## lm(formula = acanth ~ Conc, data = final)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4343 -0.6989 -0.3303  0.0861  5.0134 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.2619     0.2009   6.280 6.04e-08 ***
## Conc          0.4643     0.2286   2.031   0.0472 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.206 on 54 degrees of freedom
## Multiple R-squared:  0.07095,    Adjusted R-squared:  0.05375 
## F-statistic: 4.124 on 1 and 54 DF,  p-value: 0.04722
final2 <- final %>% filter(Conc < 2 & acanth < 5) 

countBYperc2 <- ggplot(final2, aes(x=Conc, y = acanth)) + geom_point() + theme_bw() +theme(text = element_text(size=12)) + xlab("cells per L") +ylab("% Arthracanthida-Symphyacanthida-Chaunacanthida") +
  geom_smooth(method=lm, color = Cv[5]) +theme( axis.title.y = element_text(size = 10))
countBYperc2

#ggsave("countbypercent_noO.png", width = 4, height = 4)
cBYp2 <- lm(acanth ~ Conc, data = final2)
summary(cBYp2)
## 
## Call:
## lm(formula = acanth ~ Conc, data = final2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.3995 -0.5925 -0.2245  0.1519  3.3254 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.1460     0.1680   6.822 8.72e-09 ***
## Conc          0.5110     0.1896   2.696  0.00939 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9986 on 53 degrees of freedom
## Multiple R-squared:  0.1206, Adjusted R-squared:  0.104 
## F-statistic: 7.266 on 1 and 53 DF,  p-value: 0.009394

3.3 Acantharian cell size distribution by depth

cropped all acantharian photos to touch the edges of the acantharian spines. Then got image size in pixels with: file ./* > imagesize.csv

entire image is 2448 x 2050 pixels 5.5 cm accross (55,000 µm) 55,000 / 2448 = 22.5

Station 17 Load and format data:

imagesize17 <- read.csv("~/Desktop/MiraiDPI/st17DPI/st17imagesize.csv", header = FALSE)
imagesize17$V1 <- substring(imagesize17$V1, 3, 16)
split<- do.call(rbind, strsplit(as.character(imagesize17$V2), " "))
imagesize17$height <- as.numeric(split[,2])
imagesize17$width <- as.numeric(split[,4])
imagesize17<- imagesize17[,c(1,5,6)]
names(imagesize17) <- c("photos", "height", "width")
imagesizedepth17<- merge(st17aCTD,imagesize17, by = "photos" )
imagesizedepth17$height <- (imagesizedepth17$height * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth17$width <- (imagesizedepth17$width * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth17$size <- imagesizedepth17$height * imagesizedepth17$width
imagesizedepth17$pos_depth <- imagesizedepth17$Depth * -1
imagesizedepth17$Station <- "st.17"
str(imagesizedepth17)
## 'data.frame':    237 obs. of  7 variables:
##  $ photos   : chr  "20170609085210" "20170609085215" "20170609085215" "20170609085220" ...
##  $ Depth    : num  -12.6 -13.2 -13.2 -13.1 -13.4 ...
##  $ height   : num  0.54 0.562 0.652 0.652 0.517 ...
##  $ width    : num  0.585 0.585 0.608 0.608 0.472 ...
##  $ size     : num  0.316 0.329 0.396 0.396 0.245 ...
##  $ pos_depth: num  12.6 13.2 13.2 13.1 13.4 ...
##  $ Station  : chr  "st.17" "st.17" "st.17" "st.17" ...

Station 3 Load and format data:

imagesize3 <- read.csv("~/Desktop/MiraiDPI/st3DPI/st3imagesize.csv", header = FALSE)
imagesize3$V1 <- substring(imagesize3$V1, 3, 16)
split<- do.call(rbind, strsplit(as.character(imagesize3$V2), " "))
imagesize3$height <- as.numeric(split[,2])
imagesize3$width <- as.numeric(split[,4])
imagesize3<- imagesize3[,c(1,5,6)]
names(imagesize3) <- c("photos", "height", "width")
imagesizedepth3<- merge(st3aCTD,imagesize3, by = "photos" )
imagesizedepth3$height <- (imagesizedepth3$height * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth3$width <- (imagesizedepth3$width * 22.5)/1000 #22.5um per pixel  /1000 to convert um to mm
imagesizedepth3$size <- imagesizedepth3$height * imagesizedepth3$width
imagesizedepth3$pos_depth <- imagesizedepth3$Depth * -1
imagesizedepth3$Station <- "st.3"
str(imagesizedepth3)
## 'data.frame':    115 obs. of  7 variables:
##  $ photos   : chr  "20170531074321" "20170531074322" "20170531074340" "20170531074342" ...
##  $ Depth    : num  -7.21 -7.22 -7.09 -7.24 -7.24 ...
##  $ height   : num  1.035 0.765 0.45 0.765 0.45 ...
##  $ width    : num  0.922 0.36 0.472 0.652 0.472 ...
##  $ size     : num  0.955 0.275 0.213 0.499 0.213 ...
##  $ pos_depth: num  7.21 7.22 7.09 7.24 7.24 ...
##  $ Station  : chr  "st.3" "st.3" "st.3" "st.3" ...

Station 10 Load and format data:

imagesize10 <- read.csv("~/Desktop/MiraiDPI/st10DPI/st10imagesize.csv", header = FALSE)
imagesize10$V1 <- substring(imagesize10$V1, 3, 16)
split<- do.call(rbind, strsplit(as.character(imagesize10$V2), " "))
imagesize10$height <- as.numeric(split[,2])
imagesize10$width <- as.numeric(split[,4])
imagesize10<- imagesize10[,c(1,5,6)]
names(imagesize10) <- c("photos", "height", "width")
imagesizedepth10<- merge(st10aCTD,imagesize10, by = "photos" )
imagesizedepth10$height <- (imagesizedepth10$height * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth10$width <- (imagesizedepth10$width * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth10$size <- imagesizedepth10$height * imagesizedepth10$width
imagesizedepth10$pos_depth <- imagesizedepth10$Depth * -1
imagesizedepth10$Station <- "st.10"
str(imagesizedepth10)
## 'data.frame':    359 obs. of  7 variables:
##  $ photos   : chr  "20170602061002" "20170602061004" "20170602061007" "20170602061008" ...
##  $ Depth    : num  -18.9 -19.4 -18.8 -18.5 -18.5 ...
##  $ height   : num  1.103 0.315 0.495 1.058 0.765 ...
##  $ width    : num  1.08 0.405 1.035 0.945 0.81 ...
##  $ size     : num  1.191 0.128 0.512 0.999 0.62 ...
##  $ pos_depth: num  18.9 19.4 18.8 18.5 18.5 ...
##  $ Station  : chr  "st.10" "st.10" "st.10" "st.10" ...

Station 15 Load and format data:

imagesize15 <- read.csv("~/Desktop/MiraiDPI/st15DPI/st15imagesize.csv", header = FALSE)
imagesize15$V1 <- substring(imagesize15$V1, 3, 16)
split<- do.call(rbind, strsplit(as.character(imagesize15$V2), " "))
imagesize15$height <- as.numeric(split[,2])
imagesize15$width <- as.numeric(split[,4])
imagesize15<- imagesize15[,c(1,5,6)]
names(imagesize15) <- c("photos", "height", "width")
imagesizedepth15<- merge(st15aCTD,imagesize15, by = "photos" )
imagesizedepth15$height <- (imagesizedepth15$height * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth15$width <- (imagesizedepth15$width * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth15$size <- imagesizedepth15$height * imagesizedepth15$width
imagesizedepth15$pos_depth <- imagesizedepth15$Depth * -1
imagesizedepth15$Station <- "st.15"
str(imagesizedepth15)
## 'data.frame':    524 obs. of  7 variables:
##  $ photos   : chr  "20170606150930" "20170606150930" "20170606150931" "20170606150933" ...
##  $ Depth    : num  -12.2 -12.2 -11.8 -11.7 -11.7 ...
##  $ height   : num  1.282 0.9 0.652 1.597 1.89 ...
##  $ width    : num  1.215 0.585 0.63 0.81 1.823 ...
##  $ size     : num  1.558 0.526 0.411 1.294 3.445 ...
##  $ pos_depth: num  12.2 12.2 11.8 11.7 11.7 ...
##  $ Station  : chr  "st.15" "st.15" "st.15" "st.15" ...
imagesizedepth_all <- rbind(imagesizedepth10, imagesizedepth15,imagesizedepth17, imagesizedepth3)
imagesizedepth_all$Station <- factor(imagesizedepth_all$Station, levels = c("st.10", "st.3", "st.15", "st.17"))
depthsize_all<- ggplot(imagesizedepth_all, aes(x=pos_depth, y = size)) + geom_point() +  theme_bw() +theme(text = element_text(size=12)) + xlab("Depth (m)") +ylab(expression(Acantharian~ROI~area~(mm^2))) 
depthsize_all

depthsize_all + facet_grid(~Station) + theme(strip.background = element_rect(colour='black', fill='white'))

ggsave("/Users/brisbin/Desktop/MiraiDPI/sizebydepth.png", height = 3.5, width =7)